What is value—accumulated reward or evidence?

نویسندگان

  • Karl J. Friston
  • Rick A. Adams
  • P. Read Montague
چکیده

Why are you reading this abstract? In some sense, your answer will cast the exercise as valuable-but what is value? In what follows, we suggest that value is evidence or, more exactly, log Bayesian evidence. This implies that a sufficient explanation for valuable behavior is the accumulation of evidence for internal models of our world. This contrasts with normative models of optimal control and reinforcement learning, which assume the existence of a value function that explains behavior, where (somewhat tautologically) behavior maximizes value. In this paper, we consider an alternative formulation-active inference-that replaces policies in normative models with prior beliefs about the (future) states agents should occupy. This enables optimal behavior to be cast purely in terms of inference: where agents sample their sensorium to maximize the evidence for their generative model of hidden states in the world, and minimize their uncertainty about those states. Crucially, this formulation resolves the tautology inherent in normative models and allows one to consider how prior beliefs are themselves optimized in a hierarchical setting. We illustrate these points by showing that any optimal policy can be specified with prior beliefs in the context of Bayesian inference. We then show how these prior beliefs are themselves prescribed by an imperative to minimize uncertainty. This formulation explains the saccadic eye movements required to read this text and defines the value of the visual sensations you are soliciting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Methodology for Calculating Distributions of Reward Accumulated During a Finite Interval

Markov reward models are an important formalism by which to obtain dependability and performability measures of computer systems and networks. In this context, it is particularly important to determine the probability distribution function of the reward accumulated during a nite interval. The interval may correspond to the mission period in a mission-critical system , the time between scheduled...

متن کامل

Banburismus and the Brain Decoding the Relationship between Sensory Stimuli, Decisions, and Reward

This article relates a theoretical framework developed by British codebreakers in World War II to the neural computations thought to be responsible for forming categorical decisions about sensory stimuli. In both, a weight of evidence is computed and accumulated to support or oppose the alternative interpretations. A decision is reached when the evidence reaches a threshold value. In the codebr...

متن کامل

Decisions of Value: Going Backstage; Comment on “Contextual Factors Influencing Cost and Quality Decisions in Health and Care: A Structured Evidence Review and Narrative Synthesis”

This commentary expands on two of the key themes briefly raised in the paper involving analysis of the evidence about key contextual influences on decisions of value. The first theme focuses on the need to explore in more detail what is called backstage decision-making looking at how actual decisions are made drawing on evidence from ethnographies about decision-making. These studies point to l...

متن کامل

Computation of the Distribution of Accumulated Reward with Fluid Stochastic Petri-Nets

We describe the recently introduced Fluid Stochastic Petri-Nets as a means of computing the distribution of the accumulated rate reward in a GSPN. In practice, it is the expected value of a reward which is computed, a quantity which is dependent solely on the solution of the underlying Markov chain. Until now, the instantaneous reward rates have been a function of the GSPN marking only, and the...

متن کامل

The Current and Future Sales Impact of a Retail Frequency Reward Program

This research presents an empirical study of the impact of a retail frequency reward program on store sales. We examine both the ―points-pressure,‖ or short-term impact, and the ―rewarded behavior,‖ or long-term impact. The points-pressure impact is due to forward-looking customers increasing their purchase levels in order to earn the reward. The rewarded behavior impact is evidenced as purchas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2012